Skip to content

Conversation

@JackTemaki
Copy link
Contributor

Basic jobs to use Loquacious on our standard pipelines. So far only contains jobs for dev/test as well as small and medium train sets. The large corpus needs extra handling.

The jobs require an existing huggingface cache directory.

JackTemaki and others added 2 commits September 29, 2025 09:57
Basic jobs to use Loquacious on our standard pipelines.
So far only contains jobs for dev/test as well as small and medium train sets.
The large corpus needs extra handling.

The jobs require an existing huggingface cache directory.

Co-authored-by: Nick Rossenbach <[email protected]>
Co-authored-by: Robin Schmitt <[email protected]>
"-c:a",
"libvorbis",
"-b:a",
"16k",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking some other examples.

What I also see:
["ffmpeg", "-y", "-f", "s16le", "-ar", "%i" % sr, "-i", "pipe:0", "-c:a", "libvorbis", "-q", "3.0", path]
(That's what you use for your TTS Ogg export.)
["ffmpeg", "-hide_banner", "-loglevel", "error", "-y", "-threads", "1", "-f", "s16le", "-ar", "%i" % sr, "-i", "pipe:0", "-c:a", "libvorbis", "-q", "3.0", path]

Or in i6_experiments.common.datasets.librispeech.corpus.get_bliss_corpus_dict and i6_experiments.common.datasets.tedlium2.corpus.get_bliss_corpus_dict (and many others), we use "output_format": "ogg", "codec": "libvorbis" and sample_rate=16000 for BlissChangeEncodingJob. I wonder a bit about that: Here we don't specify the quality at all (neither -b:a nor -q), as far as I can see?

I have not seen any other example using -b:a. This corresponds to the fixed_bitrate option in BlissChangeEncodingJob.

Using a fixed bitrate (ABR) (option -b) seems suboptimal to me. A variable bitrate (VBR) (option -q) makes more sense?

But I just see that -q 3 is already the default. And you said that the FFmpeg defaults are suboptimal? Maybe we should use -q 4 or higher?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just learned: If you don't pass -c:a libvorbis, and you have some weird stripped down FFMpeg build which was build without libvorbis, then FFmpeg still provides a builtin vorbis encoder, so it can still generate ogg files, but the quality will just be (much?) lower.

In some older setups, I did not use -c:a libvorbis, but simply ffmpeg -i ... out...ogg. But I think in most of my environments, I always had my custom Linuxbrew ffmpeg installed, which should have libvorbis enabled.

But now, at the RWTH HPC cluster, the FFmpeg from there (which is also only available after module load FFmpeg), that one does not support libvorbis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants